150 research outputs found
Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT
In this paper, we aimed to provide a review and tutorial for researchers in
the field of medical imaging using language models to improve their tasks at
hand. We began by providing an overview of the history and concepts of language
models, with a special focus on large language models. We then reviewed the
current literature on how language models are being used to improve medical
imaging, emphasizing different applications such as image captioning, report
generation, report classification, finding extraction, visual question
answering, interpretable diagnosis, and more for various modalities and organs.
The ChatGPT was specially highlighted for researchers to explore more potential
applications. We covered the potential benefits of accurate and efficient
language models for medical imaging analysis, including improving clinical
workflow efficiency, reducing diagnostic errors, and assisting healthcare
professionals in providing timely and accurate diagnoses. Overall, our goal was
to bridge the gap between language models and medical imaging and inspire new
ideas and innovations in this exciting area of research. We hope that this
review paper will serve as a useful resource for researchers in this field and
encourage further exploration of the possibilities of language models in
medical imaging
Multiuser Resource Allocation for Semantic-Relay-Aided Text Transmissions
Semantic communication (SemCom) is an emerging technology that extracts
useful meaning from data and sends only relevant semantic information. Thus, it
has the great potential to improve the spectrum efficiency of conventional
wireless systems with bit transmissions, especially in low signal-to-noise
ratio (SNR) and small bandwidth regions. However, the existing works have
mostly overlooked the constraints of mobile devices, which may not have
sufficient capabilities to implement resource-demanding semantic
encoder/decoder based on deep learning. To address this issue, we propose in
this paper a new semantic relay (SemRelay), which is equipped with a semantic
receiver to assist multiuser text transmissions. Specifically, the SemRelay
decodes semantic information from a base station and forwards it to the users
using conventional bit transmission, hence effectively improving text
transmission efficiency. To study the multiuser resource allocation, we
formulate an optimization problem to maximize the multiuser weighted sum-rate
by jointly designing the SemRelay transmit power allocation and system
bandwidth allocation. Although this problem is non-convex and hence challenging
to solve, we propose an efficient algorithm to obtain its high-quality
suboptimal solution by using the block coordinate descent method. Last,
numerical results show the effectiveness of the proposed algorithm as well as
superior performance of the proposed SemRelay over the conventional
decode-and-forward (DF) relay, especially in small bandwidth region.Comment: 6 pages, 3 figures, accepted for IEEE Global Communication Conference
(GLOBECOM) 2023 Workshop on Semantic Communication for 6
Pixel-wise Graph Attention Networks for Person Re-identification
Graph convolutional networks (GCN) is widely used to handle irregular data
since it updates node features by using the structure information of graph.
With the help of iterated GCN, high-order information can be obtained to
further enhance the representation of nodes. However, how to apply GCN to
structured data (such as pictures) has not been deeply studied. In this paper,
we explore the application of graph attention networks (GAT) in image feature
extraction. First of all, we propose a novel graph generation algorithm to
convert images into graphs through matrix transformation. It is one magnitude
faster than the algorithm based on K Nearest Neighbors (KNN). Then, GAT is used
on the generated graph to update the node features. Thus, a more robust
representation is obtained. These two steps are combined into a module called
pixel-wise graph attention module (PGA). Since the graph obtained by our graph
generation algorithm can still be transformed into a picture after processing,
PGA can be well combined with CNN. Based on these two modules, we consulted the
ResNet and design a pixel-wise graph attention network (PGANet). The PGANet is
applied to the task of person re-identification in the datasets Market1501,
DukeMTMC-reID and Occluded-DukeMTMC (outperforms state-of-the-art by 0.8\%,
1.1\% and 11\% respectively, in mAP scores). Experiment results show that it
achieves the state-of-the-art performance.
\href{https://github.com/wenyu1009/PGANet}{The code is available here}
- …